Air Quality Data

Column

Rows: 571
Columns: 5
$ pm25      <dbl> 10.827805, 11.583928, 11.261996, 9.414423, 11.391494, 12.384…
$ fips      <int> 1069, 1073, 1089, 1097, 1103, 1113, 1117, 1121, 1125, 1127, …
$ region    <chr> "east", "east", "east", "east", "east", "east", "east", "eas…
$ longitude <dbl> -85.35039, -86.82805, -86.58823, -88.13967, -86.91892, -85.1…
$ latitude  <dbl> 31.18973, 33.52787, 34.73079, 30.72226, 34.50702, 32.37600, …

Column

Boxplot of Air Quality Data

Analysis

The shape of the boxplot looks to be pretty symmetric, with the whiskers at around 5 and 15. There are multiple outliers above 15 and a few under 5. It seems that the Q1,Q2,Q3 are pretty evenly spaced.

Boxplot of Air Quality Values by Region

Analysis

Overall, the west has a wider range of air quality values. The east has overall higher value, but the west has a few outliers that set it above the east.

Qualities Over 15

Column

      pm25 fips region longitude latitude
1 16.19452 6019   west -119.9035 36.63837
2 15.80378 6029   west -118.6833 35.29602
3 18.44073 6031   west -119.8113 36.15514
4 16.66180 6037   west -118.2342 34.08851
5 15.01573 6047   west -120.6741 37.24578
6 17.42905 6065   west -116.8036 33.78331
7 16.25190 6099   west -120.9588 37.61380
8 16.18358 6107   west -119.1661 36.23465

Analysis

All the counties with air quality standard over 15 are in the west region. These are all counties with an fips code that starts with “6.” These are counties in California, which is a very western state.

Violin Plot of Data

Column

Analysis

The white dot in the middle of the black rectangle (IQR) represents the median, so the east overall has a higher median air quality than the west. The violin plot still shows that the range of the west is larger, but it does not show what values will be outliers as well as the boxplot does.

Histogram of Air Quality Data

Analysis

The histogram looks pretty symmetric but could be considered skewed right because of the values starting around air quality 13.

Histogram of Air Quality Data by Region

Analysis

The shape of the east histogram is taller because there were more data values for this region, and the shape is pretty symmetric. The west’s histogram is shorter and skewed right.

Scatterplot of Air Quality vs Latitude

Scatterplot of Air Quality vs Latitude by Region

Analysis

There are more data points for the east and they’re closer to each other than the west points, which are relatively spaced out.

---
title: "Assignment6-MidtermDashboard"
output: 
  flexdashboard::flex_dashboard:
    theme:
      version: 4
      bootswatch: minty
      navbar-bg: "purple"
    orientation: columns
    vertical_layout: fill
    source_code: embed 
---

```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(DT)
library(plotly)
```

Air Quality Data 
===

Column {data-width=250}
---

```{r}
PM<-read.csv("avgpm25.csv")
glimpse(PM)
datatable(PM)
```

Column {data-width=250}
---

Boxplot of Air Quality Data
=== 

```{r}
boxplot(PM$pm25,main="Distribution of Air Quality Values")
```

### Analysis

The shape of the boxplot looks to be pretty symmetric, with the whiskers at around 5 and 15. There are multiple outliers above 15 and a few under 5. It seems that the Q1,Q2,Q3 are pretty evenly spaced. 

Boxplot of Air Quality Values by Region 
===

```{r}
boxplot(PM$pm25~PM$region,main="Distribution of Air Quality Values by Region",xlab="Region",ylab="Air Quality",col="pink")
```

### Analysis

Overall, the west has a wider range of air quality values. The east has overall higher value, but the west has a few outliers that set it above the east. 

Qualities Over 15
===
Column {data-width=450}
---

```{r}
Fifteen<-filter(PM,pm25>15)
Fifteen
datatable(Fifteen)
```

### Analysis

All the counties with air quality standard over 15 are in the west region. These are all counties with an fips code that starts with "6." These are counties in California, which is a very western state.

Violin Plot of Data 
===

Column {data-width=250}
---

```{r}
#install.packages("vioplot")
#vioplot::vioplot(PM$pm25~PM$region,main="Violin Plot of the Distribution of Air Quality by Region",xlab="Region",ylab="Air Quality",col="pink")
```

### Analysis

The white dot in the middle of the black rectangle (IQR) represents the median, so the east overall has a higher median air quality than the west. The violin plot still shows that the range of the west is larger, but it does not show what values will be outliers as well as the boxplot does. 

Histogram of Air Quality Data
===

```{r}
ggplot(PM, aes(x = pm25)) +
  geom_histogram( fill = "blue", color = "black") +
  geom_vline(aes(xintercept = mean(pm25, na.rm = TRUE)), color = "red", linetype = "dashed") +
  labs(title = "Histogram of Air Quality Levels", x = "Air Quality", y = "Count") +
  theme_minimal()
```

### Analysis 

The histogram looks pretty symmetric but could be considered skewed right because of the values starting around air quality 13. 

Histogram of Air Quality Data by Region 
===

```{r}
ggplot(PM, aes(x = pm25)) +
  geom_histogram(fill = "blue", color = "black") +
  facet_wrap(~ region) +
  labs(title = "Histogram of Air Quality by Region", x = "Air Quality", y = "Count")
```

### Analysis 

The shape of the east histogram is taller because there were more data values for this region, and the shape is pretty symmetric. The west's histogram is shorter and skewed right.

Scatterplot of Air Quality vs Latitude 
===

```{r}
ggplot(PM, aes(x = latitude, y = pm25)) +
  geom_point(color = "blue") +
  labs(title = "Scatterplot of Air Quality vs Latitude", x = "Latitude", y = "Air Quality")
```

Scatterplot of Air Quality vs Latitude by Region 

```{r}
ggplot(PM, aes(x = latitude, y = pm25)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", color = "red") +
  facet_wrap(~ region) +
  labs(title = "Scatterplot of Air Quality vs Latitude by Region", x = "Latitude", y = "Air Quality")
```

### Analysis 

There are more data points for the east and they're closer to each other than the west points, which are relatively spaced out.